Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

نویسندگان

  • Sh. Rafieian Computer Engineering Department, Sheikh Bahaii University, Isfahan, Iran
چکیده مقاله:

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucity of works in the field of Persian language due to lack of reliable plagiarism checkers in Persian there is a need for a method to improve the accuracy of detecting plagiarized Persian phrases. Attempt is made in the article to present the PCP solution. This solution is a combinational method that in addition to meaning and stem of words, synonyms and pluralization is dealt with by applying the document tree representation based on manner fingerprinting the text in the 3-grams words. The obtained grams are eliminated from the text, hashed through the BKDR hash function, and stored as the fingerprint of a document in fingerprints of reference documents repository, for checking suspicious documents. The PCP proposed method here is evaluated by eight experiments on seven different sets, which include suspicions document and the reference document, from the Hamshahri newspaper website. The results indicate that accuracy of this proposed method in detection of similar texts in comparison with "Winnowing" localized method has 21.15 percent is improvement average. The accuracy of the PCP method in detecting the similarity in comparison with the language-free tool reveals 31.65 percent improvement average.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Plagiarism Detection Approach Based on SVM for Persian Texts

Plagiarism is defined as an unauthorized act of using or adapting others’ works and ideas without referring to them. Numerous methods have been proposed to detect plagiarism in different languages; however, not a lot has been accomplished in Persian. The present study has utilized statistical and semantic features to determine the functionality of Support Vector Machines (SVMs) in detecting act...

متن کامل

Persian Plagiarism Detection Using Sentence Correlations

This report explains our Persian plagiarism detection system which we used to submit our run to Persian PlagDet competition at FIRE 2016. The system was constructed through four main stages. First is pre-processing and tokenization. Second is constructing a corpus of sentences from combination of source and suspicious document pair. Each sentence considered to be a document and represented as a...

متن کامل

Mahak Samim: A Corpus of Persian Academic Texts for Evaluating Plagiarism Detection Systems

In this paper we introduce Mahak Samim, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded. This corpus, which can be used for evaluating plagiarism detection systems, consists of more than five thousand artificial plagiarism cases with various lengths and diverse degrees of obfuscation. The development process and the features of the co...

متن کامل

Fingerprinting: Hash-Based Error Detection in Microprocessors

Today’s commodity processors are tuned primarily for performance and power. As CMOS scaling continues into the deep sub-micron regime, soft errors and device wearout will increasingly jeopardize the reliability of unprotected processor pipelines. To preserve reliable operation, processor cores will require mechanisms to detect errors affecting the control and datapaths. Conventional techniques ...

متن کامل

Shape-Based Plagiarism Detection for Flowchart Figures in Texts

Plagiarism detection is well known phenomenon in the academic arena. Copying other people is considered as serious offence that needs to be checked. There are many plagiarism detection systems such as turn-it-in that has been developed to provide this checks. Most, if not all, discard the figures and charts before checking for plagiarism. Discarding the figures and charts results in look holes ...

متن کامل

A Sampling-based Tool for Plagiarism Detection in Student Texts

This paper introduces AntiPlag, an advanced plagiarism detection tool intended for use on student texts. It is capable of both hermetic detection that scrutinizes only local collections of documents (other students’ texts and lecture materials, for example) and web plagiarism detection, in which the aim is at identifying instances of plagiarism that have been sourced from the Internet. The main...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 4  شماره 2

صفحات  125- 133

تاریخ انتشار 2016-07-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023